-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A New importer #1518
A New importer #1518
Conversation
parent = os.path.join(parent, part) | ||
try: | ||
os.mkdir(parent) | ||
except FileExistsError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, Python 2 doesn't support this
@@ -280,19 +280,19 @@ def test_bin_hy_no_main(): | |||
@pytest.mark.parametrize('scenario', [ | |||
"normal", "prevent_by_force", "prevent_by_env"]) | |||
@pytest.mark.parametrize('cmd_fmt', [ | |||
'hy {fpath}', 'hy -m {modname}', "hy -c '(import {modname})'"]) | |||
'hy -m {modname}', "hy -c '(import {modname})'"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something worth noting: python foo.py
won't emit bytecode for foo.py
. We now match that behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made hy foo.hy
produce bytecode on purpose. The speed boost is a lot bigger for Hy than Python, so you shouldn't have to import from the file rather than running it directly in order to get bytecode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, I'll revert that and add documentation for it.
Add the necessary bits to provide a Python 3.4+ compliant import machinery. This means we now have our own PathFinder, which we register as in sys.meta_path, our own FileFinder, and our own Loader. Add a compatability later that builds on the Python 3 machinery to make it best-effort backwards compatible with Python 2.
This one has been sitting on the sidelines too long, hopefully I can wrap this one up today. Moving documentation to org and a gist, I'll convert it to rst before merging: https://gist.github.com/vodik/efe2310b8d0e5c065a1a38e582d4e580 |
We can't reply on hy_eval here as it doesn't properly associate the appropriate file information. Fixes hylang#1395
This is still coming along, I just hate writing documentation. |
Hi @vodik, I was actually working on implementing a working importer for hy in palimport when I found out this PR. I m not yet familiar with hylang, but it looks like you reimplemented an importlib... It might be necessary or not, not sure how much you want to deviate from python import semantics (namespaces packages, bytecode location and all that...) But I though you might want to know about filefinder2, which I maintain, as I need custom imports (py2/py3) for other projects. The goal is to eventually phase it out when py3 takes over, but it currently is a good base to start when writing a custom importer. Reusing filefinder2 might help you implement importlib by relying on python3, without too much code for py2 compatibility. Anyway, Let me know if I might be of some help here. |
FYI, I got a simple importer to work on a simple usecase there : https://github.com/asmodehn/palimport/pull/6/files |
Hey, yeah. So most of the reimplementation is for Python 2 compatibility. Drop Python 2 and 3/4 of that codebase disappears - I can rely on importlib directly. My intention was to reuse the loader between Python 2 and 3, and then implementing the corresponding import protocols. I might have done stuff wrong, but the loader should respect Python 2 conventions, like how submodules as supposed to be loaded recursively, while also leveraging Python 3.4+ semantics on Python 3 with the new module spec system. Bytecode generation and related stuff is there because, as far as I can tell, its not exposed in Python 2. Only in 3.
I only deviate with namespace packages. Maybe you can help me here, but as far as I can tell, I have to break namespace modules. This is a Python 3 bug. In theory, Hy should work just fine if I add a new Path/File finders paired with the appropriate loader. But this actually breaks Python. Here's the problematic function inside https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py#L717 def is_package(self, fullname):
"""Concrete implementation of InspectLoader.is_package by checking if
the path returned by get_filename has a filename of '__init__.py'."""
filename = _path_split(self.get_filename(fullname))[1]
filename_base = filename.rsplit('.', 1)[0]
tail_name = fullname.rpartition('.')[2]
return filename_base == '__init__' and tail_name != '__init__' Notice how its not actually checking for The workaround is to that Hy must have its own meta loader, which needs to come first, and - to avoid the problem of then loading (Well, strictly speaking, meta loader isn't necessary here - but its convenient for being able to handle overlapping Py/Hy packages when different submodules are in different languages - which we rely on internally.)
So long as you do a good job of matching Python 2's expected behaviour (instead of backporting 3), I'm game for trying to merge what I've learned here back. And if there's interest in using it besides me, I don't see a problem with it then. |
Oh right, part of my slowness here is I'm trying to draft documentation covering how all this works, the why, and the gotchas. Might as well relink it: https://gist.github.com/vodik/efe2310b8d0e5c065a1a38e582d4e580 Feedback is welcome if I got anything obviously wrong. But this is certainly a topic where resources as scarce and I kinda wish I found your stuff earlier. Having a single project for this probably makes a lot of sense. |
I would be more careful about respecting python3 conventions... because the end is near.
Yes, and it is good to have it (if you need it). I didn't implement that in filefinder2 because I had no usecase for it. Meaning, filefinder2 will use the one in python interpreter (2 or 3, the one that is available), I had no need to modify it yet, so no need to reimplement it there. Also I don't care much about having bytecode or not in python2. Talking about it now, I wonder if I even have test cases for it... If hylang needs bytecode, it would be great if you can find some time to integrate that in filefinder2( with test cases !). That package is specifically for python 2/3 compatibility, and deviation from py3 will need to be implemented outside of filefinder2 (inheriting from filefinder2 where suitable). Implementing that might be quite an adventure however...
I don't think so (or I don't get the problem yet...)
I do backport python3, to match its behavior by default, but another goal is for anyone to be able to modify (by inheritance) the behavior where needed, to :
Oh yes, please do. Also there are tests there to validate many usecases in all (supported) python versions, and that can help a lot not to break someone's python because of a small Also the more use cases we have for filefinder2, the cleaner it will be, and python3 import upgrades will be less messy, when we will have to finally let python2 die. |
Okay, writing this in a rush, so hopefully I'm clear. Sorry if not.
Did you mean respecting Python 2 conventions? Yes, I realize the end is near. If it where up to me, we'd drop Python 2 tomorrow. But in the meanwhile, I have to support it. And, in my humble opinion, the best way to do it is to match Python 2's loader's behaviour (not backport Python 3's behaviour), and do it in such a way that a simple patch will drop code, rather than requiring rewites. As for recursive:
Source: https://chaobin.github.io/2015/06/22/understand-import-system-of-python/ ^ Python 3 doesn't do this anymore. I forgot where this changed, its been about 3 months since I've last had the time to dig into this. I think it came with the newer module spec logic that came with 3.3/3.4.
It's not actually modified at the moment. I toyed with the idea and reverted it. But I've been meaning to propose reintroducing it. For example, if we where to embed a small header at the start of the bytecode file, we can choose to invalidate bytecode between Hy releases that would break things. For example, we recently went through a mangling scheme change. But this change meant all pre-existing bytecode files had the old names and thus broke. If we had control over the bytecode header, we could have at least forced Hy to invalidate. I have a few other ideas here, but its not a simple topic and needs to be hashed out with everyone here, so I'm not making that change yet.
Yeah, you're right. Its been a while since I last digged into it - its not there. Its here: https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py#L1388 Its not in the Namespace module should be designated last, after all loaders give up. If we had that behaviour, we could embed Hy as just another This is why I think we're sitting on a Python bug. The current implementation of namespace modules violate spirit of PEP 302. Let me devise a small example to demonstrate the problem: import contextlib
import os
import sys
from importlib.machinery import FileFinder, SourceFileLoader
# So, this is counter intutive, but we're going to add a new *loader*,
# that we want to go *first*, before .py.
#
# Its easier to highlight the problem this way
sys.path_hooks.insert(0, FileFinder.path_hook((SourceFileLoader, [".badpy"])))
with contextlib.suppress(FileExistsError):
os.mkdir("good")
with contextlib.suppress(FileExistsError):
os.mkdir("bad")
with contextlib.suppress(FileExistsError):
os.mkdir("contrast")
# We'll always be able to directly find modules if we don't do
# packages: good/demo.py -> good.demo works. We fall though our loader
# and the standard ones afterwards do their thing
#
# This works because the full filename matches, so __init__ stuff is bypassed.
# That's the fallback behaviour when an exact match isn't found
with open("good/demo.py", "w") as module:
module.write("""print("Hello World")""")
import good.demo
print(good.demo) # <module 'good.demo' from '/tmp/good/demo.badpy'>
# But, we can no longer import packages!
#
# This loads as a namespace module now. Why? Because the first loader
# (.badpy one) will now will look for bad/__init__.badpy, doesn't find it, but
# decides that since we're insider a folder, we a namespace
#
# No more chaining of loaders anymore, we shutdown too early - even
# though the next one afterwards would be more than happy to do its thing.
with open("bad/__init__.py", "w") as module:
module.write("""print("Hello World")""")
import bad
print(bad) # HA! <module 'bad' (namespace)>
# But we can still load modules if we use `__init__.badpy`.
with open("contrast/__init__.badpy", "w") as module:
module.write("""print("Hello World")""")
import contrast
print(contrast) # <module 'contrast' from '/tmp/contrast/__init__.badpy'>
Have a look at my design - most of the stuff is literally pulled out from Python 3.6 CPython's code base. The I haven't had the chance to audit your code yet, but I am interested in digging into it.
Well, it just shunts it around 😉
Finding the time to write test cases and documentation is actually what's I haven't found time to dig into at the moment. But all things aside, this seems to be an archaic topic, with barely any good comprehensive resources, and ended up having to dig into CPython 2.7/3.6 source code to understand how the systems worked. Having a high quality, well documented library would be a boon. |
Please, double check my work. Now that I put the effort together to write a concrete example, I should be in good shape to open a Python issue. Its been three months since I really dug deep into this and I was already finding myself rediscovering the importlib mechanisms to explain this. |
Hehe, that can be a very long debate. In my case I would think that having the user explicitly import in his package
Ah I get your point. I also had a problem with this, and I agree with you in the sense that it took me by surprise at first. I wasn't expecting this design at all. But I guess I come at it from a slightly different angle. My problem is that I want to add (and potentially remove) a
So I had to go with option2 : when I implement a custom importer, I manage the "hierarchy/subfolder" logic in I also override Note in this simple hylang importer implementation I did not handle the (namespace or not) package case, as I wasn't sure of the desired behaviour. It might be that the inconvenient implementation (for my usecase) of python importlib right now is a major hurdle for your usecase...But I did change before the namespace package semantic for an importer (rosimport), as it was really necessary. But it turned out to be so unexpected and tricky to follow, that I vowed to never do it again, if I can help it ;-). I find that Python3 has a sensible subdirectory import semantic, when it comes to modular scripting language anyway. Looking deeper into the code I found out this point also :
I would (naively - can't remember if I ever tested it) assume that :
Feel free to take or reuse all the tests in filefinder2. You will also need to box (pytest-forked) your tests, so the state of your interpreter doesn't get polluted by previous tests. One of the problem here is that the volume of files and folder grows quickly, which is another reason why I decided to have that in a separate package.
Oh yes. sadly, documentation is not really my forte either... I will now have a detail look at the code you sent here, and run it in a few different way. I ll soon post some kind of report on the gist comments. |
@vodik, I just copied this branch and made some updates for those conflicting files: https://github.com/brandonwillard/hy/tree/new-importer |
@brandonwillard vodik hasn't been active on Hy lately, so if you want to work on these changes and see them happen, I'd recommend opening a new PR. |
I really do want to see this go through, so I'll do that. |
I think this was superseded by #1672. |
This PR replace the loader with a more correct implementation, and then introduces some refactoring work.
I've added the necessary bits to provide a Python 3.4+ compliant import machinery (works on Python 3.3 as well). This means we now have our own:
PathFinder
, which we register as asys.meta_path
entryFileFinder
which is responsible for finding Hy code in the filesystemLoader
, which matches the Python implementation but for Hy instead of PythonThere's also a compatibility later for Python 2:
Importer
to sit insys.meta_path
and properly implements the recursive module loading mechanismimp.find_module
expects.Loader
that makes the Python 3 loader compatible with Python 2.On top of that, functionality from Python 3 is backported to two, like
_atomic_write
, to improve the correctness of the code that writes bytecode to disk.With these changes, the following issues are address:
importlib.reload
not works for hy module #1480 (IPython%autoreload
looks for .py extension, if you lift that requirement, autoreload also works. Probably should open an issue with IPython - I have a patch)__main__
is handled correctly __main__ is not handled correctly when a program is run with hy program.hy #418 __main__.hy doesn't work like __main__.py #1466I want to spend a bit more time refactoring
hy_compile
,ast_compile
,hy_parse
andhy_eval
as I want to expose a Hy API with similar signature as Python'sparse
,compile
, andexec
.But I think its in a reviewable state as it is.
There are still a few outstanding tasks:
hyc
- currently stubbedSome decisions that have to be made still:
runpy.run_path
(requires monkeypatching, see_get_code_from_file
) (currently leaning on no, but its implemented)runpy.run_path
? It meanshy foo.py
would work as well ashy foo.hy
, but maybe again that's desirable (currently leaning on no, but its implemented)Should we drop Python 3.3 support? While the loader is compatible with Python 3.3, the correct approach to loading modules manually with importlib by filename is different (Python 3.4 made the find_spec API public). Mostly has implications for unit testing, but I think its old enough we can drop it.For some reason I thought we supported 3.3. Never mind then.